Demographic stochasticity refers to chance events of individual mortality and reproduction, such as inevitable deviation in mean birth and death rates (Lande et al., 2003). Demographic stochasticity is only significant in small populations, and we will explain this in three examples.
# Toss a coin 10/100/10,000 times
# Generate random numbers from {0,1} with the same probability
# We assume: 1 - heads, 0 - tails
toss10 <- sample(c(0,1), size = 10, prob = c(0.5,0.5), replace = TRUE)
toss100 <- sample(c(0,1), size = 100, prob = c(0.5,0.5), replace = TRUE)
toss10k <- sample(c(0,1), size = 10000, prob = c(0.5,0.5), replace = TRUE)
# Record the results in a matrix
# Notice: sum = number of 1's = number of heads
results <- data.frame(c(sum(toss10), 10-sum(toss10)),
c(sum(toss100), 100-sum(toss100)),
c(sum(toss10k), 10000-sum(toss10k)))
colnames(results) <- c("10 tosses", "100 tosses", "10,000 tosses")
rownames(results) <- c("heads", "tails")
results
## 10 tosses 100 tosses 10,000 tosses
## heads 7 52 5051
## tails 3 48 4949
Environmental stochasticity often refers to temporal fluctuations in the probability of mortality and reproduction (Lande et al., 2003), which is often driven directly or indirectly by weather (e.g. unpredictable catastrophes).
Sampling error (or sampling variance) is the measurement error in estimates of population size or density. Some researchers categorized it as a basic form of stochasticity (Lande et al., 2003), while some distinguished it from deterministic and stochastic factors (Mills, 2007).
To answer the question why population size tends to shrink because of stochasticity, we need to understand two important concepts–arithmetic and geometric means.
Let \(\lambda_A\) and \(\lambda_G\) denote the arithmetic and geometric mean, respectively. We have the following definitions: \[\lambda_A=\frac{1}{k}\sum_{i=1}^k\lambda_i,\] \[\lambda_G=\left(\prod_{i=1}^k\lambda_i\right)^{\frac{1}{k}}.\]
# Initialize
N0 <- 1
lambda <- matrix(rbinom(1600, 1, 0.5), ncol = 16)
lambda[lambda==1] <- 1.55
lambda[lambda==0] <- 0.55
# lambda is a 100-by-16 matrix
# Each row represents a possible population change in 16 years
# Calculate the outcomes, and plot
outcome <- N0*apply(lambda, 1, prod)
par(mfrow=c(1,2))
boxplot(outcome, main = "Boxplot with outliers",
ylab = "Final population size")
boxplot(outcome, outline = FALSE, main = "Boxplot without outliers")
It would be helpful to know the conversion between the arithmetic mean of \(r\) and the geometric mean of \(\lambda\). Recall that \(r=\ln\lambda\), so \[\ln(\lambda_1\lambda_2\cdots\lambda_k)=\ln\lambda_1+\ln\lambda_2+\cdots+\ln\lambda_k=r_1+r_2+\cdots+r_k,\] which gives \[\ln(\lambda_1\lambda_2\cdots\lambda_k)^{1/k}=\frac{1}{k}\ln(\lambda_1\lambda_2\cdots\lambda_k)=\frac{1}{k}(r_1+r_2+\cdots+r_k),\] and thus \[\ln\lambda_G=r_A.\]
An interesting but less intuitive fact is that when fixing the arithmetic mean of growth rates, highly variable growth rates will more likely result in a smaller final population. In other words, increasing the variance of the growth rates (\(\sigma_\lambda^2\)) makes the geometric mean growth rate less than the arithmetic mean. Let \(\lambda_t=\lambda_A+\epsilon_t\), where \(\epsilon_t\) is the deviation of \(\lambda_t\) from the arithmetic mean \(\lambda_A\) with zero mean. Using the Taylor expansion of \(\ln(1+x)\), one could obtain \[\begin{aligned}\ln\lambda_t & =\ln\lambda_A+\ln(1+\epsilon_t/\lambda_A)\\~ & =\ln\lambda_A+ \epsilon_t/\lambda_A-(\epsilon_t/\lambda_A)^2/2+O(\epsilon_t^3),\end{aligned}\] where \(O(\epsilon_t^3)\) denotes the higher order terms. Hence, taking the expectation of both sides gives \[r_A=E(\ln\lambda_t)\cong\ln\lambda_A-\frac{E[(\lambda_t-\lambda_A)^2]}{2\lambda_A^2}=\ln\lambda_A-\frac{\sigma_\lambda^2}{2\lambda_A^2},\] which further gives \[\lambda_G\cong\exp\left(\ln\lambda_A-\frac{\sigma_\lambda^2}{2\lambda_A^2}\right).\] In fact, the geometric mean is always no larger than the arithmetic mean (and they are equal if and only if every term is the same).
Above examples assume that \(r_t\) does not depend on previous growth rates, nor will it influence subsequent growth rates. The autocorrelation describes the relationship between \(r_t\) and \(r_{t+\tau}\), its value at a time lag \(\tau\). One way to incorporate teporal autocorrelation is to: \[r_{t+\tau}=r_A+\rho(r_{t}-r_A)+\epsilon_{t+\tau},\] where \(\rho\) is the coefficient of lag-\(\tau\) autocorrelation, and \(\epsilon_t\sim N(0,\sigma_\epsilon^2)\) is white noise with zero mean and constant variance. An example would be the case \(\tau=1\) (lag-1 autocorrelation), where \(r_{t+1}=r_A+\rho(r_t-r_A)+\epsilon_{t+1}\). When \(\rho=0\), \(r_{t+1}=r_A+\epsilon_{t+1}\) and there is no temporal autocorrelation. Here we present examples of exponential growth rates with zero autocorrelation and positive lag-1 autocorrelation.
# We assume that the mean exponential growth rate r_A = 0.3,
# and the noise term epsilon_t is a standard normal random variable,
# We look at a time period of 20 years
rA <- 0.3
e <- rnorm(20, mean = 0, sd = 1)
r <- data.frame(year = 1:20, zero = rA + e)
# Calculate exponential growth rates with positive lag-1 autocorrelation
# We compare different coefficients rho = 0.2, 0.5, 0.8
pos1 = replicate(20, rA + e[1])
for (i in 2:20) {
pos1[i] <- r[i,'zero'] + 0.2*(pos1[i-1] - rA)
}
r$pos1 <- pos1
pos2 = replicate(20, rA + e[1])
for (i in 2:20) {
pos2[i] <- r[i,'zero'] + 0.5*(pos2[i-1] - rA)
}
r$pos2 <- pos2
pos3 = replicate(20, rA + e[1])
for (i in 2:20) {
pos3[i] <- r[i,'zero'] + 0.8*(pos3[i-1] - rA)
}
r$pos3 <- pos3
# Plot the results
matplot(r[2:5], type = "b", pch=1, col = 1:4,
main = 'Exponential growth rates with different lag-1 autocorrelation coefficients',
xlab = 'year', ylab = 'exponential growth rate', lty = c(1,2,2,2))
legend(x = "bottomright", legend = c('rho = 0', 'rho = 0.2', 'rho = 0.5', 'rho = 0.8'),
col = 1:4, lty = c(1,2,2,2))
Now we move on to an important section, where we try to estimate the growth rate from data, assuming some very simple model. But first, let’s talk about different types of errors we might encounter.
Process error results from variation in true population size due to biotic or abiotic processes (Ahrestani et al., 2013). Environmental and demographic stochasticities are examples of process errors.
When only process error exists, the population at each time \(t\), \(N_t\), is known and accurate. The growth rate \(\lambda_t\) is a random variable. For example, a geometric model with only process error can be described as \[\begin{aligned}N_{t+1} & =\lambda_tN_t,\\\lambda_t & \sim N(\bar\lambda,\sigma_p^2),\end{aligned}\] where \(\lambda_t\) follows a normal distribution with mean \(\bar\lambda\) and variance \(\sigma_p^2\).
Since the population at each time \(t\), \(N_t\), is known and accurate, we can calculate the estimated growth rate using geometric mean \[\hat\lambda=\left(\prod_{i=1}^t\frac{N_{i}}{N_{i-1}}\right)^{\frac{1}{t}},\] or equivalently, \[\hat r=\frac{1}{t}\sum_{i=1}^t\ln\frac{N_i}{N_{i-1}}.\]
We can notice that the estimated growth rate is only related to the initial and the final population size, as all the terms between them can be cancelled out. In other words, \[\hat\lambda=\left(\frac{N_t}{N_0}\right)^{\frac{1}{t}}\] and \[\hat r=\frac{\ln N_t-\ln N_0}{t}.\]
Here is a toy example: suppose we observe the following number of fish in a lake during the first 8 years. Assuming a geometric model, what is the average growth rate?
# Create a dataframe
N <- data.frame(year = 0:8,
size = c(100, 120, 98, 152, 298, 302, 383, 575, 728))
# Calculate annual growth rates
lambda <- replicate(8, 0)
for (i in 1:8) {
lambda[i] <- N[i+1,'size']/N[i,'size']
}
# Calculate the geometric mean growth rate
lambda_avg <- prod(lambda)^(1/8)
# The average growth rate is 1.2816, which is the same as (728/100)^(1/8)
# Calculate the theoretical population size, assuming geometric growth
N$pred <- 100*lambda_avg^N$year
# Plot the results
plot(N$year, N$size, pch = 19, col = 'blue', main = 'Population growth with process error only',
xlab = 'year', ylab = 'size')
lines(N$year, N$pred, col = 'red', lty = 2)
legend(x = 'topleft', legend = c('data', 'prediction'), col = c('blue', 'red'), lty = c(1, 2))
Observation error results from variation in the methodology used to obtain the population size (Ahrestani et al., 2013). Examples of observation error include difficulty in counting animals, which might due to lack of technical expertise, insufficient funding, etc.
When only observation error exists, the growth rate is accurate. A possible geometric model with only observation error can be described as \[\begin{aligned}\ln N_t & =\ln N_0+rt+\eta_t,\\\eta_t & \sim N(0,\sigma_o^2),\end{aligned}\] where \(\eta_t\) follows a normal distribution with mean 0 and variance \(\sigma_o^2\). Here we ignore the subscript of \(r\) as we assume the growth rate is some constant. We could also convert the above equation to \[N_{t+1}=\lambda^tN_te^{\eta_t},\] where \(e^{\eta_t}>0\) so that the population is always non-negative.
The equation \(\ln N_t=\ln N_0+rt+\eta_t\) is in the form of a linear model in which \(\ln N_t\) is the response variable and \(t\) is the predictor variable. Using simple linear regression, the slope of the fitted function is the estimated \(r\). Moreover, the \(y\)-intercept is the estimated \(\ln N_0\). Here we provide another toy example with the same data as the previous example on process error:
# Create a dataframe, and calculate the log-transformed population sizes
N <- data.frame(year = 0:8,
size = c(100, 120, 98, 152, 298, 302, 383, 575, 728))
N$log <- log(N$size)
# Run a linear regression with log-transformed population sizes as response variables
# and year as predictor variables
lm1 <- lm(log~year, data = N)
# After running the regression, we obtain the following coefficients:
# (Intercept) 4.40771
# year 0.26756
# Plot the results
plot(N$year, N$log, pch = 19, col = 'blue', main = 'Population growth with observation error only',
xlab = 'year', ylab = 'log-transformed size')
abline(lm1, col = 'red', lty = 2)
legend(x = 'topleft', legend = c('data', 'prediction'), col = c('blue', 'red'), lty = c(1, 2))
Drever, M. C., Clark, R. G., Derksen, C., Slattery, S. M., Toose, P. and Nudds, T. D. (2012) Population vulnerability to climate change linked to timing of breeding in boreal ducks, Global Change Biology, 18:480–492.
Figures: (top left) © Nrik Kiran, Mallard (Anas platyrhynchos), Creative Commons CC-BY-SA-4.0 license; (top right) © Judy Gallagher, American Wigeon (Anas americana), Creative Commons CC-BY-2.0 license; (bottom left) © MPF, Greater Scaup (Aythya marila), Creative Commons CC-BY-SA-4.0 license; (bottom right) © Andy Reago & Chrissy McClarren, Surf scoter (Melanitta perspicillata), Creative Commons CC-BY-2.0 license
Ahrestani, F., Hebblewhite, M. and Post, E. (2013) The importance of observation versus process error in analyses of global ungulate populations. Sci Rep, 3:3125.
Albon, S. D., Clutton-Brock, T. H. and Guinness, F. E. (1987) Early development and population dynamics in red deer. II. Density-independent effects and cohort variation, J. Anim. Ecol., 56: 69–81.
Beck-Johnson, L. M., Nelson, W. A., Paaijmans, K. P., Read, A. F., Thomas, M. B. and Bjonstad, O. N. (2013) The Effect of Temperature on Anopheles Mosquito Population Dynamics and the Potential for Malaria Transmission. PLoS ONE, 8(11):e79276.
Benton, T. G., Grant, A. and Clutton-Brock, T. H. (1995) Does environmental stochasticity matter? Analysis of red deer life-histories on Rum, Evolutionary Ecology, 9:559–574.
Beverton, R. and Holt, S. J. (1957) On the Dynamics of Exploited Fish Populations. Ministry of Agriculture, Fisheries and Food, London, UK.
Case, T. (2000) An Illustrated Guide to Theoretical Ecology, Oxford University Press.
Lande, R., Engen, S. and Saether, B. (2003) Stochastic Population Dynamics in Ecology and Conservation, Oxford Series in Ecology and Evolution.
Mills, L. S. (2007) Conservation of Wildlife Populations: Demography, Genetics, and Management, Wiley-Blackwell Publishing.